Where Anaphora and Coreference Meet. Annotation in the Spanish CESS-ECE Corpus

نویسندگان

  • Marta Recasens
  • M. Antònia Martí
  • Mariona Taulé
چکیده

This paper describes the guidelines of the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, which is a significant step towards the definition of an exhaustive typology of pronominal and full NP coreferential expressions and their relations for Spanish. The goal is twofold. From a computational perspective, this work establishes the formal foundations for the construction of the largest corpus of Spanish texts annotated from the morphological to the pragmatic level. This corpus, which will be publicly released, will be used to construct an automatic corpus-based coreference resolution system. From a linguistic point of view, hypotheses on coreferential expressions will be tested and validated on this framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text as Scene: Discourse Deixis and Bridging Relations

This paper presents a new framework, “text as scene”, which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis ...

متن کامل

Anotación semiautomática con papeles temáticos de los corpus CESS-ECE

In this paper we present the methodology followed in the automatic semantic annotation (argument structure and thematic roles of the verbal predicates) of the CESS-ECECAT/ESP corpus. Building from a verbal lexicon (1,482 entries) with information about the syntactic functions and their projection to arguments and thematic roles, we present a set of simple rules to automatically enrich syntactic...

متن کامل

Exploiting Semantic Information For Manual Anaphoric Annotation In Cast3LB Corpus

This paper presents the discourse annotation followed in Cast3LB, a Spanish corpus annotated with several information sources (morphological, syntactic, semantic and coreferential) at syntactic, semantic and discourse level. 3LB annotation scheme has been developed for three languages (Spanish, Catalan and Basque). Human annotators have used a set of tagging techniques and protocols. Several to...

متن کامل

WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles

This paper presents WikiCoref, an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia. Our annotation scheme follows the one of OntoNotes with a few disparities. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Since most similar annotation efforts concentrate on very specific types of w...

متن کامل

Coreference Resolution In Dialogues In English And Portuguese

This paper introduces a methodology to analyse and resolve cases of coreference in dialogues in English and Portuguese. A four-attribute annotation to analyse cases of anaphora was used to analyse a sample of around three thousand cases in each language collected in dialogue corpora. The information thus gathered was analysed by means of exploratory and model-building statistical procedures. A ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007